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Abstract 

We consider sample covariance matrices = ^l}J 2 X^X^l}J 2 where is 

a N x p real or complex matrix with i.i.d. entries with finite 12 th moment and 
En is a N x N positive definite matrix. In addition we assume that the spectral 
measure of L/v almost surely converges to some limiting probability distribution 
as N — > oo and p/N — » y > 0. We quantify the relationship between sample and 
population eigenvectors by studying the asymptotics of functionals of the type 
^Tr (g(Ejv)(5jv — i where / is the identity matrix, g is a bounded function 
and z is a complex number. This is then used to compute the asymptotically optimal 
bias correction for sample eigenvalues, paving the way for a new generation of 
improved estimators of the covariance matrix and its inverse. 



1 Introduction and Overview of the Main Results 
1.1 Model and results 

Consider p independent samples C\, . . . , C„, all of which are N X 1 real or complex 
vectors. In this paper, we are interested in the large-AMimiting spectral properties of 
the sample covariance matrix 

Sn = —CC*, C=[C\,C2,---,C p ], 

when we assume that the sample size p = p(N) satisfies p/N — > y as — > °° for some 
y > 0. This framework is known as large-dimensional asymptotics. Throughout the 
paper, 1 denotes the indicator function of a set, and we make the following assumptions: 
C = ill Xn where 
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• (H\ ) Xn is a N x p matrix of real or complex iid random variables with zero 
mean, unit variance, and 12 th absolute central moment bounded by a constant B 
independent of N and p; 

• (Ht) the population covariance matrix E# is a A^-dimensional random Hermitian 
positive definite matrix independent of 

• (H 3 ) p/N — > y > as N — ► °°; 

• (H4) (ti, . . . , Tff) is a system of eigenvalues of and the empirical spectral dis- 
tribution (e.s.d.) of the population covariance given by X) = jjYlj=\ 1[t-,+<»)(' 1 ') 
converges a.s. to a nonrandom limit H (t) at every point of continuity of //. // 
defines a probability distribution function, whose support Supp(//) is included 
in the compact interval [hi,hj\ with < h\ < I12 < °°. 

The aim of this paper is to investigate the asymptotic properties of the eigenvectors 
of such sample covariance matrices. In particular, we will quantify how the eigenvec- 
tors of the sample covariance matrix deviate from those of the population covariance 
matrix under large-dimensional asymptotics. This will enable us to characterize how 
the sample covariance matrix deviates as a whole (i.e. through its eigenvalues and its 
eigenvectors) from the population covariance matrix. Specifically, we will introduce 
bias-correction formulae for the eigenvalues of the sample covariance matrix that can 
lead, in future research, to improved estimators of the covariance matrix and its in- 
verse. This will be developped in the discussion (Sections 1.2 and 1.3) following our 
main result Theorem 1.2 stated below. 

Before exposing our results, we briefly review some known results about the spectral 
properties of sample covariance matrices under large-dimensional asymptotics. 

In the whole paper we denote by ((Aj , . . . ,Ajy); (iq , . . . a system of eigenval- 
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ues and orthonormal eigenvectors of the sample covariance matrix 5V = j£j l X N X^'Lj i . 
Without loss of generality, we assume that the eigenvalues are sorted in decreasing or- 
der: A^ > X2 > ■ ■ ■ > X$. We also denote by (v^ , . . . , v$) a system of orthonormal 
eigenvectors of Ln- Superscripts will be omitted when no confusion is possible. 

First the asymptotic behavior of the eigenvalues is now quite well understood. The 
"global behavior" of the spectrum of 5V for instance is characterized through the e.s.d., 
defined as: Fjv(A) = A^ 1 Yj=i 1[A,- ,+°°) (X), VA e M. The e.s.d. is usually described 
through its Stieltjes transform. We recall that the Stieltjes transform of a nondecreasing 
function G is defined by mc(z) = f_™(X — z)~ l dG(X) for all z in C + , where C + = 
{z G C, lm(z) > 0}. The use of the Stieltjes transform is motivated by the following 
inversion formula: given any nondecreasing function G, one has that G(b) — G(a) = 
lim^^ + 7T _1 fll Im [mc(^ + /rj)] d%, which holds if G is continuous at a and b. 

The first fundamental result concerning the asymptotic global behavior of the spec- 
trum has been obtained by Marcenko and Pastur in [21]. Their result has been later 
precised e.g. in [4, 14, 16,29, 30]. In the next Theorem, we recall their result (which 
was actually proved in a more general setting than that exposed here) and quote the 
most recent version as given in [28]. 
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Let m FN (z) = ^4=1 = F Tr [( Sn ~ zI )~ l ] ) where I denotes the N x N identity 
matrix. 

Theorem 1.1 ( [21]). Under Assumptions (H\) to (H4), one has that for all z G C + , 

limjv-,oo/ni? w (z) =m/?(z) a.i. where 

VzeC+, m F (z) = J {t[\ - - y- x zm F {z)\ -zy l dH(x). (1) 

Furthermore, the e.s.d. of the sample covariance matrix given by F^(X) =N~ l YJiLi l[X,-,+oo)(^) 
converges a.s. to the nonrandom limit F{X) at all points of continuity ofF. 

In addition, [11] show that the following limit exists : 

VAeM-{0}, lim m F {z)=m F {X). (2) 

They also prove that F has a continuous derivative which is given by F' = 71 \m[m F ] 
on (0,+°o). More precisely, when 7 > 1, lim 7G c+^A m F{z) = m F (X) exists for all 
A£l,F has a continuous derivative F' on all of M, and F(X) is identically equal to 
zero in a neighborhood of A = 0. When 7 < 1 , the proportion of sample eigenvalues 
equal to zero is asymptotically 1 — 7. In this case, it is convenient to introduce the 
e.s.d. F_ = (l — 7 _1 ) l[o,+oo) + y~ l F, which is the limit of e.s.d. of the /^-dimensional 
matrix p^ 1 X^T.nXn. Then lim z€ £+^^ mpiz) = mp_(X) exists for all lei, F has a 
continuous derivative F_' on all of M, and F_(X) is identically equal to zero in a neigh- 
borhood of A = 0. When 7 is exactly equal to one, further complications arise because 
the density of sample eigenvalues can be unbounded in a neighborhood of zero; for this 
reason we will sometimes have to rule out the possibility that 7=1. 
Further studies have complemented the a.s. convergence established by the Marcenko- 
Pastur theorem (see e.g. [1,5-7,9, 15,23] and [2] for more references). The Marcenko- 
Pastur equation has also generated a considerable amount of interest in statistics [13, 
19], finance [17, 18], signal processing [12], and other disciplines. We refer the inter- 
ested reader to the recent book by Bai and Silverstein [8] for a throrough survey of this 
fast-growing field of research. 

As we can gather from this brief review of the literature, the Marcenko-Pastur equa- 
tion reveals much of the behavior of the eigenvalues of sample covariance matrices 
under large-dimensional asymptotics. It is also of utmost interest to describe the 
asymptotic behavior of the eigenvectors. Such an issue is fundamental to statistics 
(for instance both eigenvalues and eigenvectors are of interest in Principal Components 
Analysis), communication theory (see e.g. [22]), wireless communication, finance. The 
reader is referred to [3], Section 1 for more detail and to [10] for a statistical approach 
to the problem and a detailed exposition of statistical applications. 
Actually much less is known about eigenvectors of sample covariance matrices. In the 
special case where E = I and the Xjj are i.i.d. standard (real or complex) Gaussian 
random variables, it is well-known that the matrix of sample eigenvectors is Haar dis- 
tributed (on the orthogonal or unitary group). To our knowledge, these are the only 
ensembles for which the distribution of the eigenvectors is explicitly known. It has 
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been conjectured that for a wide class of non Gaussian ensembles, the matrix of sam- 
ple eigenvectors should be "asymptotically Haar distributed", provided £ = I. Note 
that the notion "asymptotically Haar distributed" needs to be defined. This question 
has been investigated by [25], [26], [27] followed by [3] and [22]. Therein a random 
matrix U is said to be asymptotically Haar distributed if Ux is asymptotically uniformly 
distributed on the unit sphere for any non random unit vector x. [27] and [3] are then 
able to prove the conjecture under various sets of assumptions on the Xy's. 
In the case where E^l, much less is known (see [3] and [22]). One expects that the 
distribution of the eigenvectors is far from being rotation-invariant. This is precisely 
the aspect in which this paper is concerned. 



In this paper, we present another approach to study eigenvectors of sample covari- 
ance matrices. Roughly speaking, we study "functionals" of the type 

VzeC + , 0*(z) = if -L-h u * Vj \ 2 xg(rj) (3) 

where g is any real-valued univariate function satisfying suitable regularity conditions. 
By convention, g(£jv) is the matrix with the same eigenvectors as Ln and with eigen- 
values g(T\ ), . . . ,g(TAr). These functionals are generalizations of the Stieltjes transform 
used in the Marcenko-Pastur equation. Indeed, one can rewrite the Stieltjes transform 
of the e.s.d. as: 

VzeC+, «V w (z) = if y-— f>*v 7 | 2 xl. (4) 
JV i=i A > z j=\ 

The constant 1 that appears at the end of Equation (4) can be interpreted as a weighting 
scheme placed on the population eigenvectors: specifically, it represents a flat weight- 
ing scheme. The generalization we here introduce puts the spotlight on how the sample 
covariance matrix relates to the population covariance matrix, or even any function of 
the population covariance matrix. 

Our main result is given in the following Theorem. 

Theorem 1.2. Assume that conditions {H\) — (H4) are satisfied. Let g be a (real- 
valued) bounded function defined on \h\,h,2\ with finitely many points of discontinu- 
ity. Then there exists a nonrandom function & g defined over C + such that ®^(z) = 
jV -1 Tr USjf — z/) _1 ^(Ea?)] converges a.s. to & g (z) for all z £ C + . Furthermore, & 8 is 
given by: 

VzeC + , ©«( z )=y + {T[\-y- 1 -Y- l zm F {z)]-zY l g{i)dH{x). (5) 

One can first observe that as we move from a flat weighting scheme of g = 1 to any 
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arbitrary weighting scheme g(tj), the integration kernel |t 



2 1 zm F {z] 



r r 
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remains unchanged. Therefore, our Equation (5) generalizes Marcenko and Pastur's 
foundational result. Actually the proof of Theorem 1.2 follows from some of the argu- 
ments used in [28] to derive the Marchenko-Pastur equation. This proof is postponed 
until Section 2. 

The generalization of the Marcenko-Pastur equation we propose allows to con- 
sider a few unsolved problems regarding the overall relationship between sample and 
population covariance matrices. Let us consider two of these problems, which are in- 
vestigated in more detail in the two next subsections. 

The first of these questions is: how do the eigenvectors of the sample covariance ma- 
trix deviate from those of the population covariance matrix? By injecting functions g 
of the form 1(-<x>,t) mto Equation (5), we quantify the asymptotic relationship between 
sample and population eigenvectors. This is developed in more detail in Section 1.2. 
Another question is: how does the sample covariance matrix deviate from the popula- 
tion covariance matrix as a whole, and how can we modify it to bring it closer to the 
population covariance matrix? This is an important question in Statistics, where a co- 
variance matrix estimator that improves upon the sample covariance matrix is sought. 
By injecting the function g(x) = 1 into Equation (5), we find the optimal asymptotic 
bias correction for the eigenvalues of the sample covariance matrix in Section 1.3. We 
also perform the same calculation for the inverse covariance matrix (an object of great 
interest in Econometrics and Finance), this time by taking g{%) = 1/t. 
This list is not intended to be exhaustive. Other applications may hopefully be extracted 
from our generalized Marcenko-Pastur equation. 

1.2 Sample vs. Population Eigenvectors 

As will be made more apparent in Equation (8) below, it is possible to quantify the 
asymptotic behavior of sample eigenvectors in the general case by selecting a function 
g of the form 1 (-.*>, T ) in Equation (5). Let us briefly explain why. 
First of all, note that each sample eigenvector m, lies in a space whose dimension is 
growing towards infinity. Therefore, the only way to know "where" it lies is to project 
it onto a known orthonormal basis that will serve as a reference grid. Given the nature 
of the problem, the most meaningful choice for this reference grid is the orthonormal 
basis formed by the population eigenvectors (vi , . . . , \'n)- Thus we are faced with the 
task of characterizing the asymptotic behavior of u*Vj for all i,j = I,... ,N, i.e. the 
projection of the sample eigenvectors onto the population eigenvectors. Yet as every 
eigenvector is identified up to multiplication by a scalar of modulus one, the argument 
(angle) of u\ vj is devoid of mathematical relevance. Therefore, we can focus instead 

I |2 

on its square modulus u\ vj without loss of information. 
Another issue that arises is that of scaling. Indeed as 

I N N ^ N ( N \ 1 N 1 

^2 E I |«fvy| = I «? ( I vyvj U= F E «f »» = jj, 

i=l./=l (=1 \j=l / i=l 

we study N | u*vj | " instead, so that its limit does not vanish under large-A^ asymptotics. 
The indexing of the eigenvectors also demands special attention as the dimension goes 
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to infinity. We choose to use an indexation system where "eigenvalues serve as labels 
for eigenvectors", that is m, is the eigenvector associated to the i th largest eigenvalue A, . 

All these considerations lead us to introduce the following key object: 

VA,tgK, <MA,T) = i££ |«*v 7 .| 2 l^ +oo) (A)xl [Tj . +oo) (T). (6) 

ly i=ij=l 

This bivariate function is right continuous with left-hand limits and nondecreasing in 
each of its arguments. It also verifies linu^-.*, On (A, t) = and lim <t>w(A, t) = 

1 . Therefore, it satisfies the properties of a bivariate cumulative distribution function. 

Remark 1. Our function can be compared with the object introduced in [3]: 
VA <G M, F^ N (A) = Y?iLi l M i"-%| 2 1[A;,+=o)(A), where (xn)n=i,2,... is a sequence of non- 
random unit vectors satisfying the non-trivial condition x* N — zl)~ l xn — > mn(z)- 
This condition is specified so that projecting the sample eigenvectors onto x^ effec- 
tively wipes out any signature of non-rotation-invariant behavior. The main difference 
is that <£>n projects the sample eigenvectors onto the population eigenvectors instead. 

From <S>n we can extract precise information about the sample eigenvectors. The 
average of the quantities of interest N | u*vj | over the sample (resp. population) eigen- 
vectors associated with the sample (resp. population) eigenvalues lying in the interval 
[A, A] (resp. [r, f]) is equal to: 

<P N (X, t) - <P N (A, i) - <t> N (A , t) + <t> N ( A , t) 

[Fiv(X) -Fiv(A)] X [Hiv(T) -//iv(l)] 

whenever the denominator is strictly positive. Since A and A (resp. T and T) can be 
chosen arbitrarily close to each other (as long as the average in Equation (7) exists), 
our goal of characterizing the behavior of sample eigenvectors would be achieved in 
principle by determining the asymptotic behavior of <t>Af. This can be deduced from 
Theorem 1 .2 thanks to the inversion formula for the Stieltjes transform: for all (A , t) S 
R 2 such that <t>N is continuous at (A , t) 

4> w (A,t) = lim - ( X Im [& N {$+iT])] d%, (8) 

which holds in the special case where g = l^^^. We are now ready to state our second 
main result. 

Theorem 1.3. Assume that conditions (Hi) — (Ha,) hold true and let <J>at(A,t) be de- 
fined by (6). Then there exists a nonrandom bivariate function <t> such that <i>N (A , t) — 
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4>(A, t) at all points of continuity o/ < J>. Furthermore, when 7^1, the function <f> can 
be expressed as: V(A,t) E M 2 , 4>(A,t) = J^J^f(l,t)dH(t)dF(l), where 



<p{l,t) 



r l it 



(at-l) 2 +b 2 t 2 



ifl>0 

if I = and y < 1 



(9) 



(l-y)[l+«V(0)f 

otherwise. 



and a ( resp. b) is the real ( resp. imaginary) part ofl—y — y Imp (I). 

Equation (9) quantifies how the eigenvectors of the sample covariance matrix de- 
viate from those of the population covariance matrix under large-dimensional asymp- 
totics. The result is explicit as a function of nip- 

To illustrate Theorem 1.3, we can pick any eigenvector of our choosing, for exam- 
ple the one that corresponds to the first (i.e. largest) eigenvalue, and plot how it projects 
onto the population eigenvectors (indexed by their corresponding eigenvalues). The re- 
sulting graph is shown in Figure 1. This is a plot of <p(l,t) as a function of f, for fixed I 
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Population Eigenvectors Indexed by their Associated Eigenvalues 



Figure 1 : Projection of first sample eigenvector onto population eigenvectors (indexed 
by their associated eigenvalues). We have taken H' = lp.6]- 

equal to the supremum of Supp(F). It is the asymptotic equivalent to plotting A^|«jVj| 2 
as a function of Xj. It looks like a density because, by construction, it must integrate to 
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one. As soon as the sample size starts to drop below 10 times the number of variables, 
we can see that the first sample eigenvector starts deviating quite strongly from the 
first population eigenvectors. This should have precautionary implications for Princi- 
pal Component Analysis (PCA), where the number of variables is often so large that it 
is difficult to make the sample size more than ten times bigger. 

Obviously, Equation (9) would enable us to draw a similar graph for any sample 
eigenvector (not just the first one), and for any y and H verifying the assumptions of 
Theorem 1.3. Preliminary investigations reveal some unexpected patterns. For exam- 
ple: one might have thought that the sample eigenvector associated with the median 
sample eigenvalue would be closest to the population eigenvector associated with the 
median population eigenvalue; but in general this is not true. 

1.3 Asymptotically Optimal Bias Correction for the Sample Eigen- 
values 

We now bring the two preceding results together to quantify the relationship between 
the sample covariance matrix and the population covariance matrix as a whole. As will 
be made clear in Equation (12) below, this is achieved by selecting the function g{x) = 
T in Equation (5). The objective is to see how the sample covariance matrix deviates 
from the population covariance matrix, and how we can modify it to bring it closer to 
the population covariance matrix. The main problem with the sample covariance matrix 
is that its eigenvalues are too dispersed: the smallest ones are biased downwards, and 
the largest ones upwards. This is most easily visualized when the population covariance 
matrix is the identity, in which case the limiting spectral e.s.d. F is known in closed 
form (see Figure 2). We can see that the smallest and the largest sample eigenvalues 
are biased away from one, and that the bias decreases in 7. Therefore, a key concern 
in multivariate statistics is to find the asymptotically optimal bias correction for the 
eigenvalues of the sample covariance matrix. As this correction will tend to reduce the 
dispersion of the eigenvalues, it is often called a shrinkage formula. 

Ledoit and Wolf [20] made some progress along this direction by finding the op- 
timal linear shrinkage formula for the sample eigenvalues (projecting on the two- 
dimensional subspace spanned by Sn and /). However, shrinking the eigenvalues is a 
highly nonlinear problem (as Figure 3 below will illustrate). Therefore, there is strong 
reason to believe that finding the optimal nonlinear shrinkage formula for the sample 
eigenvalues would lead to a covariance matrix estimator that further improves upon the 
Ledoit- Wolf estimator. Theorem 1.2 paves the way for such a development. 
To see how, let us think of the problem of estimating in general terms. In order to 
construct an estimator of we must in turn consider what the eigenvectors and the 
eigenvalues of this estimator should be. Let us consider the eigenvectors first. In the 
general case where we have no prior information about the orientation of the popula- 
tion eigenvectors, it is reasonable to require that the estimation procedure be invariant 
with respect to rotation by any p-dimensional orthogonal matrix W. If we rotate the 
variables by W, then we would ask our estimator to also rotate by the same orthogonal 
matrix W. The class of orthogonally invariant estimators of the covariance matrix is 
constituted of all the estimators that have the same eigenvectors as the sample covari- 
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Figure 2: Limiting density of sample eigenvalues, in the particular case where all the 
eigenvalues of the population covariance matrix are equal to one. The graph shows 
excess dispersion of the sample eigenvalues. The formula for this plot comes from 



solving the Marcenko-Pastur equation for H = l[i +c 



ance matrix (see [24], Lemma 5.3). Every rotation-invariant estimator of E^ is thus of 
the form: 

U n D n Um, where Dn = Diag(c/i, ... ,dx) is diagonal, 

and where Un is the matrix whose / th column is the sample eigenvector m,. This is the 
class that we consider. 

Our objective is to find the matrix in this class that is closest to the population covari- 
ance matrix. In order to measure distance, we choose the Frobenius norm, defined as: 
1 1 A 1 1 f = \/Tr (AA*) for any matrix A . Thus we end up with the following optimization 
problem: min^ diagonal 1 1 ^n^n^n ~ 1 1 f ■ Elementary matrix algebra shows that its 
solution is: 

Dff = Diag(di, . . . ,dff) where Vz = l,...,iV di = u*Y,^u i . 

The interpretation of c/, is that it captures how the z' th sample eigenvector m, relates to 
the population covariance matrix as a whole. 

While UnDnU^ does not constitute a bona fide estimator (because it depends on the 
unobservable E^), new estimators that seek to improve upon the existing ones will need 
to get as close to UnDnU^ as possible. This is exactly the path that led Ledoit and 
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Wolf [20] to their improved covariance matrix estimator. Therefore, it is important, in 
the interest of developing a new and improved estimator, to characterize the asymptotic 
behavior of dj (i = 1 , . . . ,N). The key object that will enable us to achieve this goal is 
the nondecreasing function defined by: 

VxeR, A w (x) = if ^l^ +oo) (x) = if «*E NMi xl [Ai , +oo) (x). (10) 
i= l i= l 

When all the sample eigenvalues are distinct, it is straightforward to recover the dfs 
from An: 

Vi = l,...,N di= hm — i. (11) 

The asymptotic behavior of A^ can be deduced from Theorem 1.2 in the special case 
where #(t) = T: for all jc € M such that A^ continuous at x 

A N (x) = lim I jT Jm + in)] dt; , g(x) = x. (12) 

We are now ready to state our third main result. 

Theorem 1.4. Assume that conditions (H\ ) — (H4) hold true and let A^ be defined as 
in ( 10). There exists a nonrandom function A defined over R such that A^(x) converges 
a.s. to A(x) for all x £ R — {0}. If in addition 7 7^ 1, then A can be expressed as: 
Vx e R, A(jc) = 5{X)dF(X), where 



VA € R, 5(A) 



y- l -y- l Xm F {X)\ 2 



ifX >0 



ifX = and y<\ (13) 



(l-y)m £ (0) 

otherwise. 



By Equation (11) the asymptotic quantity that corresponds to dj = u*Y.NU i is 5(A), 
provided that A corresponds to A, . Therefore, the way to get closest to the population 
covariance matrix (according to the Frobenius norm) would be to divide each sample 
eigenvalue A/ by the correction factor |1 — y 1 — y~ l X ihf (A;)| 2 . This is what we call 
the optimal nonlinear shrinkage formula or asymptotically optimal bias correction. 1 
Figure 3 shows how much it differs from Ledoit and Wolf's [20] optimal linear shrink- 
age formula. In addition, when y < 1, the sample eigenvalues equal to zero need to be 
replaced by 5(0) = y/[(l - y)mfj0)]. 

In a statistical context of estimation, mF(Xi) and m/r(0) are not known, so they need 
to be replaced by mp(Xi) and m^(0) respectively, where F is some estimator of the 
limiting p.d.f. of sample eigenvalues. Research is currently underway to prove that a 



1 This approach cannot possibly generate a consistent estimator of the population covariance matrix ac- 
cording to the Frobenius norm when 7 is finite. At best, it could generate a consistent estimator of the 
projection of the population covariance matrix onto the space of matrices that have the same eigenvectors as 
the sample covariance matrix. 
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Figure 3: Comparison of the Optimal Linear vs. Nonlinear Bias Correction Formulae. 
In this example, the distribution of population eigenvalues H places 20% mass at 1, 
40% mass at 3 and 40% mass at 10. The solid line plots 5(A) as a function of A. 



covariance matrix estimator constructed in this manner has desirable properties under 
large-dimensional asymptotics. 

A recent paper [13] introduced an algorithm for deducing the population eigen- 
values from the sample eigenvalues using the Marcenko-Pastur equation. But our ob- 
jective is quite different, as it is not the population eigenvalues T; = v*'L^/v i that we 
seek, but instead the quantities c/, = u* T.n Uj, which represent the diagonal entries of 
the orthogonal projection (according to the Frobenius norm) of the population covari- 
ance matrix onto the space of matrices that have the same eigenvectors as the sample 
covariance matrix. Therefore the algorithm in [13] is better suited for estimating the 
population eigenvalues themselves, whereas our approach is better suited for estimat- 
ing the population covariance matrix as a whole. 

Monte-Carlo simulations indicate that applying this bias correction is highly ben- 
eficial, even in small samples. We ran 10,000 simulations based on the distribution of 
population eigenvalues H that places 20% mass at 1, 40% mass at 3 and 40% mass at 
10. We kept / constant at 2 while increasing the number of variables from 5 to 100. For 
each set of simulations, we computed the Percentage Relative Improvement in Average 
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Loss (PRIAL). The PR1AL of an estimator M of is defined as 

2 

E M-U N D N U N 

PRIAL(M) = 100 x 



1- 



E 



Sn — UnDnII? 



By construction, the PRIAL of the sample covariance matrix Sn (resp. of UnDnU n ) 
is 0% (resp. 100%), meaning no improvement (resp. meaning maximum attainable 
improvement). For each of the 10,000 Monte-Carlo simulations, we consider Sn, which 
is the matrix obtained from the sample covariance matrix by keeping its eigenvectors 



and dividing its / th eigenvalue by the correction factor 1 1 

2 



r 



r 



l km F (Xj)\ 2 . The 



expected loss E 



U N D N Ul 



is estimated by computing its average across the 



10,000 Monte-Carlo simulations. Figure 4 plots the PRIAL obtained in this way, that 
is by applying the optimal nonlinear shrinkage formula to the sample eigenvalues. We 
can see that, even with a modest sample size like p = 40, we already get 95% of the 
maximum possible improvement. 

A similar formula can be obtained for the purpose of estimating the inverse of the 
population covariance matrix. To this aim, we set g(x) = 1/t in Equation (5) and 
define 

N 

V N {x) := AT 1 x l^.+oo) (x), V* € K. 

/=i 

Theorem 1.5. Assume that conditions (H\) — (H4) are satisfied. There exists a non- 
random function *P defined over R, such that 'Pa^x) converges a.s. to *P(x) for all 
x £ K — {0}. If in addition 7 7^ 1, then *P can be expressed as: Vx £ M, ^(x) = 
j^y{X)dF(X), where 



VAeM \jf{k) 



l-y- 1 -27- 1 ARe[OT f (A)] 
m//(0)-mf(0) 







ifX >0 

ifX = and y < 1 
otherwise. 



(14) 



Therefore, the way to get closest to the inverse of the population covariance matrix 
(according to the Frobenius norm) would be to multiply the inverse of each sample 
eigenvalue A ; _I by the correction factor 1 — y~' — 2y~ 1 A I - Re[thf (A,)]. This represents 
the optimal nonlinear shrinkage formula (or asymptotically optimal bias correction) for 
the purpose of estimating the inverse covariance matrix. Again, in a statistical context 
of estimation, the unknown m/?(Aj) needs to be replaced by mp(Xi), where F is some 
estimator of the limiting p.d.f. of sample eigenvalues. This question is investigated in 
some work under progress. 



The rest of the paper is organized as follows. Section 2 contains the proof of The- 
orem 1.2. Section 3 contains the proof of Theorem 1.3. Section 4 is devoted to the 
proofs of Theorems 1.4 and 1.5. 
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y=2 

1 00% i — i . 1 1 1 — 




Sample Size 

Figure 4: Percentage Relative Improvement in Average Loss (PRIAL) from applying 
the optimal nonlinear shrinkage formula to the sample eigenvalues. The solid line 
shows the PRIAL obtained by dividing the ; th sample eigenvalue by the correction fac- 
tor |1 — y l — Y (A,-)| 2 , as a function of sample size. The dotted line shows the 
PRIAL of the Ledoit-Wolf [20] linear shrinkage estimator. For each sample size we 
generated 10,000 Monte-Carlo simulations using the multivariate Gaussian distribu- 
tion. Like in Figure 3, we used 7=2 and the distribution of population eigenvalues H 
placing 20% mass at 1, 40% mass at 3 and 40% mass at 10. 

2 Proof of Theorem 1.2 

The proof of Theorem 1.2 follows from an extension of the usual proof of the Marcenko- 
Pastur theorem (see e.g. [28] and [4]). The latter is based on the Stieltjes transform and, 
essentially, on a recursion formula. First, we slightly modify this proof to consider 
more general functionals ® g N for some polynomial functions g. Then we use a standard 
approximation scheme to extend Theorem 1.2 to more general functions g. 
First we need to adapt a Lemma from Bai and Silverstein [4]. 

Lemma 2.1. Let Y = {y\ , . . . ,yff) be a random vector with i.i.d. entries satisfying: 

Eyi=0, E| yi | 2 = l, E|yi| 12 <B, 
where the constant B does not depend on N. Let also Abe a given N x N matrix. Then 
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there exists a constant K > independent ofN, A and Y such that: 

E\YAY* -Tr(A)\ 6 < K\\A\\ 6 N\ 

Proof of Lemma 2.1 The proof of Lemma 2. 1 directly follows from that of Lemma 
3.1 in [4]. Therein the assumption that E|yi | 12 < B is replaced with the assumption that 
\yi | < \nN. One can easily check that all their arguments carry through if one assumes 
that the twelfth moment of y\ is uniformly bounded. □ 

Next, we need to introduce some notation. We set Rn{z) = {Sn — zJ) and define 
®n\z) = N _1 Tr[%(z)£*] for all z G C+ and integer k. Thus, 0^ = 0* if we take 
g{x) = T k , Vt G R. In particular, 0^' = mp N . To avoid confusion, the dependency 
of most of the variables on N will occasionally be dropped from the notation. All 
convergence statements will be as N — > °°. Conditions (Hi) — (H4) are assumed to 
hold throughout. 

Lemma 2.2. One has thatMz G C+, ®^ } (z) ^ ® (l) {z) where: 

® {l \z)= J M -y. 
7- 1 -zm F (z) 

Proof of Lemma 2.2 In the first part of the proof, we show that 

p 1 p 1 

l+zm FN {z) = - - - L ; ; : (U/ : 

Using the a.s. convergence of the Stieltjes transform mf n {z), it is then easy to deduce 
the equation satisfied by in Lemma 2.2. Our proof closely follows some of the 
ideas of [28] and [4]. Therein the convergence of the Stieltjes transform niF N (z) is 
investigated. 

Let us define Q = p~ l / 2 \/Y,Xi c , where is the k th column of X. Then Sn = 
Yf k = 1 C k c t ■ Using the identity S N -zI + zI = L[= 1 C k C* k , one deduces that 

I T r(/ + ^( z )) = 1 f C* k R N (z)C k . (15) 

Define now for any integer 1 < k < p 

RP(z):=(S N -C k C* k ~zI)- 1 . 
By the resolvent identity R N (z) - R^ (z) = -R N {z)C k C* k R^ ] (z) , we deduce that 
ClR N {z)C k ~e k R.f(z)C k = -C k R N {z)C k CtRf{z)C k , 



which finally gives that 



C* k R N (z)C k = m C*X\z)C k . 

l+C* k Rf(z)C k 
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Plugging the latter formula into (15), one can write that 

1 



1 +zm FN (z) 



(k) 

We will now use the fact that R N and Q are independent random matrices to estimate 
the asymptotic behavior of the last sum in ( 16). Using Lemma 2. 1, we deduce that 



1 p 

E 



N N£il+C* k 4\z)C k 



(16) 



max 

ke{i,...,p} 



CtR ( £ ) C k -~Tr(R%>Z 



0. 



(17) 



as N — > oo. Furthermore, using Lemma 2.6 in Silverstein and Bai (1995), one also has 
that 

ieii 



Tr 



< 



py 



Thus using (18), (17) and (16), one can write that 



f \ P 1 
l+ mFN ( z ) = ---L 



£il + (N/p)®$\ z ) 



(18) 



(19) 



where the error term Sn is given by Sn = 8^ + 8 N with 



Trf^-^^E 



+ iTr(^E))(l + iTr(<E)) 

qR^c k -TT(R^h) 



and 



1 P 



N k % (l + lTr(<E))(l + Iq^ ) Q)' 

We will now use (18) and (17) to show that 8^ a.s. converges to as N — ► °°. It is known 
that F/f converges a.s. to the distribution F given by the Marcenko-Pastur equation (and 
has no subsequence vaguely convergentto 0). It is proven in Silverstein and Bai (1995) 
that under these assumptions, there exists m > such that irrfjv m,m]) > 0. In 
particular, there exists 5 > such that 



1 



inf Im 

N 



From this, we deduce that 



-dF N (X) 



1 + -Tr(ZRjv) 
P 



Using (18) we also get that 
1 



1 



-Tr EJ? 



> 



> Im 



> Im 



2X 2 + 2x 2 +y 



■dF N (X) > 8. 



-Tr(Efl A 



>^8. 

r 



- 2 7 
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We first consider 8^. Thus one has that 



\Sir\ < 



Nyhjd 2 



= 0(l/N). 



(20) 



We now turn to 8^. Using the a.s. convergence (17), it is not hard to deduce that 

8l -> 0, a.s. 



This completes the proof of Lemma 2.2. □ 



Lemma 2.3. For every k = 1,2,... the limit \im N ^ ®f (z) : = ©W(z) exists and sat- 
isfies the recursion equation 



VzeC+, © (A ' +1) (z) 



z0 w (z) + [ T k dH(x) 



1 + V>(z) 
7 



(21) 



Proof of Lemma 2.3 The proof is inductive, so we assume that formula (21) holds 
for any integer smaller than or equal to q for some given integer q. We start from the 
formula 

Tr (I.i + zT. q R N (z)) = Tr (I?R N (z)S N ) = £ C* k T?R N (z)C k . 

k=l 

Using once more the resolvent identity, one gets that 



C* k Z<R N (z)C k = 



which yields that 



P qT.«R%\z)C k 

ki+q4\z)c k 



-Tr(?.«+zZ«R N (z)) = Z 



(22) 



It is now an easy consequence of the arguments developed in the case where q = to 
check that 



max 

ke{i,...,p} 



C* k R%\z)C k -Tr(m N (z)) + C* k H"R^(z)C k -Tr(^ +l R N (z)) 



(*), 



converges a.s. to zero. Using the recursion assumption that lim^^oo ©jy (z) exists, Vk < 
q, one can deduce that limA^oo ®jy ^(z) exists and that the limit &^ +1 \z) satisfies 



z@ M (z)+ ifdH{x) 



1 



l + -0«(z) 

r 



= 0^ +1) (z) 



This finishes the proof of Lemma 2.3. □ 

Lemma 2.4. Theorem 1.2 holds when the function g is a polynomial. 
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Proof of Lemma 2.4 Given the linearity of the problem, it is sufficient to prove that 
Theorem 1.2 holds when the function g is of the form: Vt G K, g{x) = x k , for any 
nonnegative integer k. In the case where k = 0, this is a direct consequence of Theorem 
1.1 in [28]. 

The existence of a function defined on C + such that ®!y (z) — — > 0^(z) for 
all z G C + is established by Lemma 2.2 for k = 1 and by Lemma 2.3 for = 2,3,... 
Therefore, all that remains to be shown is that Equation (5) holds for k = 1,2,... 

We will first show it for k = 1 . From the original Marcenko-Pastur equation we 
know that: 



1 +zm F (z) ■ 
From Lemma 2.2 we know that 



r[l-y 1 -y l zm F (z)] 
T [l ~T X ~ T l zm F (z)]~z 



dH{x). 



(23) 



0(1) ( Z ) 



yielding that 



y— 1 -zm F (z) 
1 +zm F (z) = 



7= 



1 +zm F {z) 



1 -y _1 -T l zm F (z)' 
0(1) ( z ) 



l + r i0(i)( z )' 



Combining Equations (23) and (24) yields: 
° i[\-y- 1 -y~ { zm F (z) 



t [1 — y- 1 - y- l zm F (z)]-z 
From Lemma 2.2, we also know that: 



l + y- 1 0( 1 >(z) = r 



dH{x) 



1 



©(i) ( z ) 



l + y-l©(i)( z )' 



_y-l -y-^zm F {z)' 
Putting together Equations (25) and (26) yields the simplification: 

1 



(24) 



(25) 



(26) 



0«(z) 



t[1 — — y _1 zm/7(z) 



xdH{x), 



which establishes that Equation (5) holds when g{x) = x, Vt G M. 

We now show by induction that Equation (5) holds when g(x) = x k for k = 2, 3, 
Assume that we have proven it for k — 1 . Thus the recursion hypothesis is that: 



0(*-i)( z 

From Lemma 2.3 we know that 

©W(z) = 



1 



Jc-l 



x[\- y~ l - y- l zm F (z) 



dH{x). 



z^ k - l \z) + J 



Jc-l 



dH(x) 







X 


l + -& w (z) 







(27) 



(28) 
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Combining Equations (27) and (28) yields: 
@{k){z) z0^(z) + r 



x[l-y- 1 - y- l zm F (z)]~z 
1 - y~ l - T x zm F (z) 



TTdH(%). (29) 



t[1 - T 1 - r^zm^)] -z 
Putting together Equations (26) and (29) yields the simplification: 

&{k) ^= rU v-1 !-l T^—^dH{x\ 

J-oo t[1 — y '-7 'z»y?(z)J — z 

which proves that the desired assertion holds for k. Therefore, by induction, it holds 
for all k = 1,2,3, .. . This completes the proof of Lemma 2.4. □ 

Lemma 2.5. Theorem 1.2 holds for any function g that is continuous on [h\,h-i\. 

Proof of Lemma 2.5 We shall deduce this from Lemma 2.4. Let g be any function 
that is continuous on [Ai,/i2]. By the Weierstrass approximation theorem, there exists 
a sequence of polynomials that converges to g uniformly on \h\,hi\. By Lemma 2.4, 
Theorem 1.2 holds for every polynomial in the sequence. Therefore it also holds for 
the limit g. □ 

We are now ready to prove Theorem 1.2. We shall prove it by induction on the 
number k of points of discontinuity of the function g on the interval [Ai,/i2]. The fact 
that it holds for k = has been established by Lemma 2.5. Let us assume that it 
holds for some k. Then consider any bounded function g which has k + 1 points of 
discontinuity on [h\Ji2\. Let v be one of these k+l points of discontinuity. Construct 
the function: Vjc S [/ii , /12] , p(x) = g(x) x (jc— v). The function p has k points of 
discontinuity on \h\,hj\: all the ones that g has, except v. Therefore, by the recursion 
hypothesis, & p N {z) = A^'Tr [(S N - zl)~ l p(l. n)] converges a.s. to 

p (z) = [ + °° -r. i jy. p (t) dH(r) (30) 

J-,*, t[1 -7" 1 - T l zm F {z)]-z 

for all z € C + . It is easy to adapt the arguments developed in the proof of Lemma 2.3 
to show that limAf^oo ®jy(z) exists (as g is bounded) and is equal to: 



@ g (z) 



0"(z)- l + 7- 1 © (1) « J-:g{?)dH{T) 



z [l + r l©(l)( z )] -v 

for all z G C + . Plugging Equation (30) into Equation (31) yields: 

{ T[1 - r .- T ^ (z)J _ z - [1 + r WKz)} } gfrWT) 



(31) 



z[i + r~ l ® w (z)]-v 
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Using Equation (26) we get: 

I T\l-r l -rhm F (z)\-z ~ l-T [ -T l zm F (z) f 8^)dH(x) 

® 8 (z) = ^ 

l-Y~ l -y~ l zm F {z) 

J — { T [l- r l- r lzn, F {z)\-z}x[l-r l -r l :n<F(z)\ 8[ ' [ ' 
z-v[l-~r l -T [ zm F (z)] 
l-T [ -~T l zm F (z) 

= II T[l-y^-rhm F {z)]-z 8{X)dH{Xl 

which means that Equation (5) holds for g. Therefore, by induction, Theorem 1.2 holds 
for any bounded function g with a finite number of discontinuities on [h \ , h 2 ] . □ 

3 Proof of Theorem 1.3 

At this stage, we need to establish two Lemmas that will be of general use for deriving 
implications from Theorem 1 .2. 

Lemma 3.1. Let g denote a (real-valued) bounded function defined on [h\Ji2\ with 
finitely many points of discontinuity. Consider the function £l s N defined by: 

V* 6 E, Q* (*) = i £ |«?v y | 2 x 

^ 1=1 7=1 

77ien f/zere ex/sfs a nonrandom function Q. 8 defined on R smc/z f/ia? £lfj(x) H g (x) af 
a// points of continuity ofQ. 8 . Furthermore, 

Q 8 (jc) = lim - r I m [0* (A + ;tj )] <f A (32) 

/or a// x where Q. 8 is continuous. 

Proof of Lemma 3.1 The Stieltjes transform of Q g N is the function ©f, defined by 
Equation (3). From Theorem 1.2, we know that there exists a nonrandom function & g 
defined over C + such that ®fr(z) — *■ & 8 (z) for all z G C + . Therefore, Silverstein and 
Bai's [4] Equation (2.5) implies that: lim#-K»fljv(x) = Q. 8 (x) exists for all x where Q. 8 
is continuous. Furthermore, the Stieltjes transform of Q. 8 is & 8 . Then Equation (32) is 
simply the inversion formula for the Stieltjes transform. □ 

Lemma 3.2. Under the assumptions of Lemma 3.1, ify> 1 then for all {x\,x 2 ) & R 2 -' 
a. s (x 2 )-Q. s (xi) = - H~ lim \mW{k + ii\)]dk. (33) 
Ify< 1 then Equation (33) holds for all (xi,x 2 ) € R 2 such that x\X2 > 0. 
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Proof of Lemma 3.2 One can first note that lim- eC +^ A . Im [® s (z)] = Im exists 
for all x £ R (resp. all x £ K — {0}) in the case where y > 1 (resp. y < 1). This is 
obvious if x £ Supp(/ r ). In the case where x ^ Supp(/ r ), then it can be deduced from 



Theorem 4.1 in [11] that 



Supp(H), which ensures the desired 



1 - y~ l (l +xm F (x)) 

result. Now s is the Stieltjes transform of Q. g . Therefore, Silverstein and Choi's [11] 
Theorem 2.1 implies that: 

Q. s is differentiable at x and its derivative is: — Im [© 1? (x)] 

n 

for all x £ M (resp. all x £ R — {0}) in the case where y > 1 (resp. y < 1). When we 
integrate, we get Equation (33). □ 

We are now ready to proceed with the proof of Theorem 1.3. Let TGlbe given and 
take g — !(_„ T \. Then we have: 



VzeC H 







■(— ,t) 



(z) 



£ l M / v i| 2><1 



(—.*)• 



Since the function g = has a single point of discontinuity (at t), Theorem 1.2 



implies that Vz £ C 

VzgC + , - r >(z) = 
Remember from Equation (8) that: 



®N~°°' T \z) ©^-^(z), where: 



-co f [1 - y~ l - y- x zm F {z)] 



■dH(t). 



(34) 



4>w(A, t) = lim — / Im 

71^0+ 7T J-oo 



0^(/ + ilj) 



dl. 



Therefore, by Lemma 3. 1, limAr^ o < t ) Aj(A, t) exists and is equal to: 



4>(A,t) = lim — 
7)^0+ It 



Im 



®\-^){l + ir\) 



dh 



(35) 



for every (A,t) € R 2 where <t> is continuous. We first evaluate <J>(A,t) in the case 
where 7 > 1, so that the limiting e.s.d. F is continuously differentiable on all of R. 
Plugging (34) into (35) yields: 



<3>(A,t) = lim 

77^0+ 

1 /* 



1 /* 



Im 



1 



lim Im 

>7)^0+ 



f [a(/, ij) + ib(l,r\)] — l — irj 
1 

f [a(Z, 77) + ib(l, 77)] -I- if] 



dH(t)jdl 
dH(t)dl, (36) 



where a(l,rj) + ib(l,rj) = 1 - 7 1 - 7 : (/ + ir])mif{l + ir\). The last equality follows 
from Lemma 3.2. Notice that: 



Im 



1 



T] -/?(/, T])f 



/ [a(Z, 77) + /&(/, 77)] -I- in) [ a (l, 77)/ - /] 2 + [fc(/,T7> - 77 



2 • 
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Taking the limit as 77 — > + , we get: 

1 lm F (l) 



a{l,r\) 



■ a = Re 



1 



7 



7 



lm 



1 lm F (I) 
7 7 



The inversion formula for the Stieltjes transform implies: V/ G R, F'(Z) = ^Im [mf (I)], 
therefore b = — %"f~ IF'(1). Thus we have: 



lim lm 



ny 1 lt 



t[a{l,T]) + ib{l,T))]-l-ir) J {at - If + b 2 t 2 
Plugging Equation (37) back into Equation (36) yields that: 



xF'(l). (37) 



which was to be proven. This completes the proof of Theorem 1.3 in the case where 
Y> 1. 

In the case where 7 < 1, much of the arguments remain the same, except for an 
added degree of complexity due to the fact that the limiting e.s.d. F has a discontinuity 
of size 1 — 7 at zero. This is handled by using the following three Lemmas. 



Lemma 3.3. If 7 7^ 1, F is constant over the interval ( 0, (1 — ~^) 2n i 



Proof of Lemma 3.3 If H placed all its weight on h\, then we could solve the 
Marcenko-Pastur equation explicitly for F, and the infimum of the support of the lim- 
iting e.s.d. of nonzero sample eigenvalues would be equal to (1 — y~ x l 2 ) 2 x h\. Since, 
by Assumption (Ha), H places all its weight on points greater than or equal to h\, 
the infimum of the support of the limiting e.s.d. of nonzero sample eigenvalues has 
to be greater than or equal to (1 — y~ x l 2 ) 2 x h\ (see Equation (1.9b) in Bai and Sil- 
verstein [7]). Therefore, F is constant over the open interval (0,(1 — y~ x l 2 ) 2 x /21). 
□ 

Lemma 3.4. Let k > be a given real number. Let \l be a complex holomorphic 
function defined on the set {z € C + : Re[z] £ (— K, k)}. If fi{0) G K then: 

f 1 f+ £ 
lim < lim — / lm 

£^0+ ^77^0+ % J-£ 



g(£+jgj 



d£, =m(o). 



Proof of Lemma 3.4 For all e in (0, K), we have: 



1 r+ £ 



lim — 

i)^0+ n J-e 



lm 



1 



1 



'7 



dt; = lim — , 

rj^0+ K J-e § 2 + TJ 2 

1 

= lim — 

rj->0+ % 

= 1. 



arctan 



arctan 



(38) 
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Since ji is continuously differentiable, there exist 8 > 0, j3 > such that |/i'(z)| < 
j3,Vz,|z| <<5. Using Taylor's theorem, we get that |ju(z) - ju(0)| </3|z|,V|z| <<5.Now 
we can perform the following decomposition: 



lim { lim — 



f^o+ L?]^o+ n J-e 



Im 



lim < lim — 



1 /-+ £ 



£^0+ Lt)->0+ 7t J- £ 

u(0) lim < lim — 

£^0+ [ri^0+ K 



Im 



M(4 + itj)-M(0) + M(0) 



Im 



1 



lim < lim 



1 /- +£ . 



Im 



. 4 + "7 
M(<?+/7?)-M(0) 



M(0)+ lim <^ lim - 



1 



Im 



M(4 + iTj)-M(0) 



£^0+ ^T]^0+ 7T J-e 

where the last equality follows from Equation (38). The second term vanishes because: 

M(4+iTj)-M(0)" 









lim 


] lim 


ir, m 






ft i-£ 



< lim < lim 

£^0+ 1^7)^0+ % 



1 /-+ £ 



< lim < lim 



e 

1 /•+ £ 



/x(^+i77)-/x(0) 



0. 



This yields Lemma 3.4. □ 



Lemma 3.5. Assume that y < 1. Lef g be a ( real-valued) bounded function defined on 
[h\,h2] with finitely many points of discontinuity. Then: 



lim lim — 





f Imj 







dH{%)dE, 

1 +mf(0)T 



dH(x), 



where F_ = (l — y ') 1[o,-h») + 7 andnif{0) = lim ;eC +^ mf (z). 

Proof of Lemma 3.5 One has that 

VzeC + , l+zm F (z) = y+yzm^z), 

— z[l +»I£(z)t] . 



Define: 



— y ! + y 'zotf (z)] — z 



(39) 
(40) 



mf_(z)r 



dH{x). 
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Equation (40) yields: 

lira lim - ( +£ / +0 °lm(- ■ — — \dH(x)dl 

£^0+77^0+ 7T J-e J-00 \ z[l-y- l -y-\t,+ir l )m F (i,+ir i )\-t,-ir i 1 

lim lim - [ +£ \ m {-—L- [ + °° g $ . dH(x)\d$ 



lim lim — / Im ' 



e^0+ T[-»0+ ft J-e { 
8{T) : dH(z), 



1 +W£(0)t 

where the last equality follows from Lemma 3.4. □ 

We are now ready to complete the proof of Theorem 1.3 for the case where 7 < 1. 
The inversion formula for the Stieltjes transform implies that: 

lim [$>(£, T) — <£(-£, T)] 

£^0+ 



1 f+ e 

lim lim — / Im 

£^0+T]^0+ 7T J-£ 



©W)(§+iT}) ^ 

1 / ,+£ . r <«rfrt 



lim lim - / Im f m U 

1 



1 + OT£ (0) t 



dH(t), (41) 



where the last equality follows from Lemma 3.5. By Lemma 3.3, we know that for A 
in a neighborhood of zero: F(X) = (1 — y)l[o,+°o)(A). From Equation (41) we know 
that for A in a neighborhood of zero: 

Comparing the two expressions, we find that for A in a neighborhood of zero: 

3>(A,T)= [ X f - — -—dH(t)dF(l). 

Therefore, if we define 9 as in (9), then we can see that for A in a neighborhood of 
zero: 

4>(A,t)= / / (p(l,t)dH(t)dFQ). (42) 



From this point onwards, the fact that Equation (42) holds for all A > can be es- 
tablished exactly like we did in the case where 7 > 1 . This completes the proof of 
Theorem 1.3. □ 
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4 Proofs of Theorems 1.4 and 1.5 



4.1 Proof of Theorem 1.4 

Lemma 2.2 shows that Vz G C+, ®$(z) ^ (1) (z), where: 



VzeC+,0 (1) (z) = y 
Remember from Equation (12) that: 



Aw(-v) = lim — 



-T 



lm 



7 'zmf(z) 



-y 



(43) 



t/A. 



Therefore, by Lemma 3.1, lim;v^oo Atv(jc) exists and is equal to: 



A(jc) = lim — 
77^0+ 7T 



lm 



(1 >(A + /77) 



dX 



(44) 



for every jgl where A is continuous. We first evaluate A(x) in the case where y > 1 . 
Plugging Equation (43) into Equation (44) yields: 



1 

lim — 

77^0+ % 



lm 



1 - y- 1 - y- 1 (A + i 77 )m F ( X + if] ) 



7 



dX 



= lim 



n lm [{X + ir])mF(X + i7\)] 

77^0+ J-00 |1 _ y-l _ +/77)m F (A +!Tj)| : 

7r _1 lm [(A +i'Tj)m/7(A + /r/)] 



lim 

>t]^o+ |l -y-i-y-i(X+iri)m F (X+iri)Y 
7Z~ l \m [Amf(A)] 



rdX 



dX 



(45) 



— |l- r i- r iAm F (A)r 
XF'(X) 

— |l- r i- r iAm f (A)| 2 
A 

-~ ll-^-y-U/nFCA)! 1 



r(iA 



c*A 



rfF(A), 



where Equation (45) made use of Lemma 3.2. This completes the proof of Theorem 
1 .4 in the case where 7 > 1 . 

In the case where 7 < 1, much of the arguments remain the same. The inversion 
formula for the Stieltjes transform implies that: 



lim [A(e)-A(-e)] 



1 r+ £ 



lim lim — 

£^0+77^0+ 71 J- e 

1 r+ £ 

lim lim — 

£->0+7)^0+ K 



1 +rhf_ (0)t 



lm 
lm 

dH{%), 



eW(§+!Tj)l d^ 



TdH(x) 



(46) 
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where the last equality follows from Lemma 3.5. Notice that for all z E C + : 

T , , 1 /'+ 00 l+m/r(z)T- 1 . , 



1 + Wl£(z) T OT£ (z) J_oo 1 + m/?(z) T 

1 1 /•+" 



/ " TT / \ dH(%). (47) 



m£(z) »i£(z) J-oo 1 + m£(z) t 
Plugging Equation (40) into Equation (47) yields: 

T 



1 + m£(z) t 

1 z /-+ 00 1 



c/#(t) 

r+~ 1 

dH{x) 

(48) 



m£(z) m£(z) J-oo 1 - y 1 + y l zm F (z) 
1 +zm/?(z) 



mf(z) 

where the last equality comes from the original Marcenko-Pastur equation. Plugging 
Equation (39) into Equation (48) yields: 

T 1+zotf(z) 
dH(x) = y- 



1 + mpiz) T m£(z) 
Taking the limit as z G C + — > 0, we get: 

t 7 

l+mF_{Q)x dH<y ^ ~ m £ (0) ' 

Plugging this result back into Equation (46) yields: 

lim [A(e)-A(-e)] = ^-. (49) 

By Lemma 3.3, we know that for A in a neighborhood of zero: F(X) = (1 — y)l[o,+°°)(A). 
From Equation (49) we know that for x in a neighborhood of zero: 



7_ 

-oo m £ (0) 



Comparing the two expressions, we find that for x in a neighborhood of zero: 

J-oo (1 - y)m E _(Q) 

Therefore, if we define 8 as in (13), then we can see that for x in a neighborhood of 
zero: 

A(jc) = [ X 8(X)dF(X). (50) 



From this point onwards, the fact that Equation (50) holds for all x > can be estab- 
lished exactly like we did in the case where y > 1. Thus the proof of Theorem 1.4 is 
complete. □ 
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4.2 Proof of Theorem 1.5 

As 



VxeR, ^(*) = ^£i[^4-)(*)L 



«i 1 V I 



1=1 



;=1 T J 



and using the inversion formula for the Stieltjes transform, we obtain: 



Vx € R, x Vn{x) = Um / Im 0^"(A + /rj) 



dA. 



Since the function = 1/t is continuous on [hi, hj], Theorem 1.2 implies that Vz € 

C+, ©^(z) ©^(z), where: 



VzeC+, ©("^(z) 



t[1 - 7" 1 - T l zm F (z)]-z 
Therefore, by Lemma 3.1, limN^oo^Nix) exists and is equal to: 

1 



dH{x). 



m(x) = lim 



Im 



©(-^(A + iTj) 



dX, 



(51) 



(52) 



for every x € R where *P is continuous. We first evaluate ^(x) in the case where 7 > 1, 
so that F is continuously differentiable on all of R. 

In the notation of Lemma 2.5, we set v equal to zero so that Vt £ R, p(x) = 
g(x) x x = 1. Then Equation (31) implies that: 



VzeC+, ©^(z) 



m F (z) — 


l + y -i@(i)(z)" 




x- x dH{x) 




z[i + r'0 (1) W] 




[1-r 1 


- 7 zmj?(z)] - 


u 


x- l dH(x) 



Using Equation (26), we obtain: 

VzeC + , 0(-D(z) = ^M 
z 

Thus for all A G R: 



lim lm[0 _1 (A + ii7)] = j Im |»V(A) [l - 7" 1 - y _1 A m F (A)] } 



71^0+ 



T {l-7" 1 -27" 1 ARe[m f (A)]} x lm[m f (A)] 



T {l-7- 1 -27- 1 ARe[»V(A)]} x ttF'(A). 
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Plugging this result back into Equation (52) yields: 



1 r* 

— / lim Im 

1 -y- 1 -2y- l XR&[m F {X)\ 



dX 
dF(X), 



where we made use of Lemma 3.2. This completes the proof of Theorem 1.5 in the 
case where y > 1 . 

We now turn to the case where y < 1 . Equation (52) implies that: 



lim [¥(£)- ¥(-£)] = lim lim 



1 



Im 



£^0+ £^0+ 77^0+ 7t J- £ 

Plugging Equation (39) into Equation (53) yields for all z £ C + : 



®(-V(Z+irj)\dZ. (54) 



-niF(z)mF_(z) 



1 



1 



T-0 



dH{x) 



- [1 - 7- yzmAz)] mpiz) m H {0)- 



Plugging this result into Equation (54), we get: 



lim [ l P(e)- l P(-£)] = lim lim 



1 



Im 



£^0+" " e->0+ tj->0+ n J-e 

where fi(z) = — [1 — 7— yzmf_(z)]niF_(z) + m#(0). Therefore, by Lemma 3.4, we have: 

(55) 



lim pF(e) -*(-£)] = M(0) = -(1 - 7)*f(0) +*h(0). 

By Lemma 3.3, we know that for A in a neighborhood of zero: = (1 — 7)1 [0, 
From Equation (55) we know that for x in a neighborhood of zero: 



-(l-y)m L (0)+m H (0)}dl^ +oo) (X). 



Comparing the two expressions, we find that for x in a neighborhood of zero: 



-m/?(0) + m#(0) 

- 1 - 7 



</F(A). 



Therefore, if we define \j/ as in (14), then we can see that for x in a neighborhood of 
zero: 

<F(x) = /* yr(A)dF(A). (56) 

From this point onwards, the fact that Equation (56) holds for all x > can be estab- 
lished exactly like we did in the case where 7 > 1. Thus the proof of Theorem 1.5 is 
complete. □ 
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